翻訳と辞書 |
Latent Dirichlet allocation : ウィキペディア英語版 | Latent Dirichlet allocation
In natural language processing, Latent Dirichlet allocation (LDA) is a generative model that allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar. For example, if observations are words collected into documents, it posits that each document is a mixture of a small number of topics and that each word's creation is attributable to one of the document's topics. LDA is an example of a topic model and was first presented as a graphical model for topic discovery by David Blei, Andrew Ng, and Michael I. Jordan in 2003. == Topics in LDA == In LDA, each document may be viewed as a mixture of various topics. This is similar to probabilistic latent semantic analysis (pLSA), except that in LDA the topic distribution is assumed to have a Dirichlet prior. In practice, this results in more reasonable mixtures of topics in a document. It has been noted, however, that the pLSA model is equivalent to the LDA model under a uniform Dirichlet prior distribution. For example, an LDA model might have topics that can be classified as CAT_related and DOG_related. A topic has probabilities of generating various words, such as ''milk'', ''meow'', and ''kitten'', which can be classified and interpreted by the viewer as "CAT_related". Naturally, the word ''cat'' itself will have high probability given this topic. The DOG_related topic likewise has probabilities of generating each word: ''puppy'', ''bark'', and ''bone'' might have high probability. Words without special relevance, such as ''the'' (see function word), will have roughly even probability between classes (or can be placed into a separate category). A topic is not strongly defined, neither semantically nor epistemologically. It is identified on the basis of supervised labeling and (manual) pruning on the basis of their likelihood of co-occurrence. A lexical word may occur in several topics with a different probability, however, with a different typical set of neighboring words in each topic. Each document is assumed to be characterized by a particular set of topics. This is akin to the standard bag of words model assumption, and makes the individual words exchangeable.
抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)』 ■ウィキペディアで「Latent Dirichlet allocation」の詳細全文を読む
スポンサード リンク
翻訳と辞書 : 翻訳のためのインターネットリソース |
Copyright(C) kotoba.ne.jp 1997-2016. All Rights Reserved.
|
|